Goto

Collaborating Authors

 reaction template


SynTwins: A Retrosynthesis-Guided Framework for Synthesizable Molecular Analog Generation

Chen, Shuan, Nam, Gunwook, Aspuru-Guzik, Alan, Jung, Yousung

arXiv.org Artificial Intelligence

The disconnect between AI-generated molecules with desirable properties and their synthetic feasibility remains a critical bottleneck in computational discovery of drugs and materials. While generative AI has accelerated the proposal of candidate molecules, many of these structures prove challenging or impossible to synthesize using established chemical reactions. Here, we introduce SynTwins, a novel retrosynthesis-guided molecule design framework that finds synthetically accessible molecular analogs by emulating expert chemists' strategies in three steps: retrosynthesis, searching similar building blocks, and virtual synthesis. Using a search algorithm instead of a stochastic data-driven generator, SynTwins outperforms state-of-the-art machine learning models at exploring synthetically accessible analogs while maintaining high structural similarity to original target molecules. Furthermore, when integrated into existing molecular property-optimization frameworks, our hybrid approach produces synthetically feasible analogs with minimal loss in property scores. Our comprehensive benchmarking across diverse molecular datasets demonstrates that SynTwins effectively bridges the gap between computational design and experimental synthesis, providing a practical solution for accelerating the discovery of synthesizable molecules with desired properties for a wide range of applications.



Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

Hassen, Alan Kai, Bernatavicius, Andrius, Janssen, Antonius P. A., Preuss, Mike, van Westen, Gerard J. P., Clevert, Djork-Arné

arXiv.org Artificial Intelligence

Applications of machine learning in chemistry are often limited by the scarcity and expense of labeled data, restricting traditional supervised methods. In this work, we introduce a framework for molecular reasoning using general-purpose Large Language Models (LLMs) that operates without requiring labeled training data. Our method anchors chain-of-thought reasoning to the molecular structure by using unique atomic identifiers. First, the LLM performs a one-shot task to identify relevant fragments and their associated chemical labels or transformation classes. In an optional second step, this position-aware information is used in a few-shot task with provided class examples to predict the chemical transformation. We apply our framework to single-step retrosynthesis, a task where LLMs have previously underperformed. Across academic benchmarks and expert-validated drug discovery molecules, our work enables LLMs to achieve high success rates in identifying chemically plausible reaction sites ($\geq90\%$), named reaction classes ($\geq40\%$), and final reactants ($\geq74\%$). Beyond solving complex chemical tasks, our work also provides a method to generate theoretically grounded synthetic datasets by mapping chemical knowledge onto the molecular structure and thereby addressing data scarcity.



TempRe: Template generation for single and direct multi-step retrosynthesis

Xuan-Vu, Nguyen, Armstrong, Daniel P, Jončev, Zlatko, Schwaller, Philippe

arXiv.org Artificial Intelligence

Retrosynthesis planning remains a central challenge in molecular discovery due to the vast and complex chemical reaction space. While traditional template-based methods offer tractability, they suffer from poor scalability and limited generalization, and template-free generative approaches risk generating invalid reactions. In this work, we propose TempRe, a generative framework that reformulates template-based approaches as sequence generation, enabling scalable, flexible, and chemically plausible retrosynthesis. We evaluated TempRe across single-step and multi-step retrosynthesis tasks, demonstrating its superiority over both template classification and SMILES-based generation methods. On the PaRoutes multi-step benchmark, TempRe achieves strong top-k route accuracy. Furthermore, we extend TempRe to direct multi-step synthesis route generation, providing a lightweight and efficient alternative to conventional single-step and search-based approaches. These results highlight the potential of template generative modeling as a powerful paradigm in computer-aided synthesis planning.


Reviews: Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network

Neural Information Processing Systems

Summary: This work provides a novel approach to predict the outcome of organic chemical reactions. A reaction can be computationally regarded as graph-prediction problem: given the input of several connected graphs (molecules), the model aims to predict a fully-connected graph (reaction product) that can be obtained by performing several graph edits (reaction) on some edges and nodes (reaction center) in the input graphs. Past reaction predictions involving exhaustively enumeration of reaction centers and fitting them to a large number of existing reaction templates, which is very inefficient and hard to scale. In this work, the author proposed a template-free method to predict the outcome. It is a 3 step pipeline: 1) identify the reaction center given the input graphs using a Weisfeiler-Lehman Network.


LLM-Augmented Chemical Synthesis and Design Decision Programs

Wang, Haorui, Guo, Jeff, Kong, Lingkai, Ramprasad, Rampi, Schwaller, Philippe, Du, Yuanqi, Zhang, Chao

arXiv.org Artificial Intelligence

Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible pathways. Concurrently, large language models (LLMs) have exhibited remarkable chemical knowledge, hinting at their potential to tackle complex decision-making tasks in chemistry. In this work, we explore whether LLMs can successfully navigate the highly constrained, multi-step retrosynthesis planning problem. We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy, moving beyond the conventional step-by-step reactant prediction. Through comprehensive evaluations, we show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.


SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models

Sun, Kunyang, Bagni, Dorian, Cavanagh, Joseph M., Wang, Yingze, Sawyer, Jacob M., Gritsevskiy, Andrew, Head-Gordon, Teresa

arXiv.org Artificial Intelligence

Generative machine learning models for small molecule drug discovery have shown immense promise, but many molecules generated by this approach are too difficult to synthesize to be worth further investigation or further development. We present a novel approach by fine-tuning Meta's Llama3 large language models (LLMs) to create SynLlama, which generates full synthetic pathways made of commonly accessible Enamine building blocks and robust organic reaction templates. SynLlama explores a large synthesizable space using significantly less data compared to other state-of-the-art methods, and offers strong performance in bottom-up synthesis, synthesizable analog generation, and hit expansion, offering medicinal chemists a valuable tool for drug discovery developments. We find that SynLlama can effectively generalize to unseen yet purchasable building blocks, meaning that its reconstruction capabilities extend to a broader synthesizable chemical space than the training data.


Predicting Chemical Reaction Outcomes Based on Electron Movements Using Machine Learning

Chen, Shuan, Park, Kye Sung, Kim, Taewan, Han, Sunkyu, Jung, Yousung

arXiv.org Artificial Intelligence

Accurately predicting chemical reaction outcomes and potential byproducts is a fundamental task of modern chemistry, enabling the efficient design of synthetic pathways and driving progress in chemical science. Reaction mechanism, which tracks electron movements during chemical reactions, is critical for understanding reaction kinetics and identifying unexpected products. We demonstrate the high predictive performance of Reactron over existing product-only models by a large-scale reaction outcome prediction benchmark, and the adaptability of the model to learn new reactivity upon providing a few examples. Furthermore, it explores combinatorial reaction spaces, uncovering novel reactivities beyond its training data. With robust performance in both in-and out-of-distribution predictions, Reactron embodies human-like reasoning in chemistry and opens new frontiers in reaction discovery and synthesis design. Main In organic chemistry, a reaction mechanism is a theoretical trajectory that describes how the electron moves within organic molecules in a chemical reaction.

  Country:
  Genre: Research Report > New Finding (0.47)
  Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

A Transformer Model for Predicting Chemical Reaction Products from Generic Templates

Ozer, Derin, Lamprier, Sylvain, Cauchy, Thomas, Gutowski, Nicolas, Da Mota, Benoit

arXiv.org Artificial Intelligence

The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these limitations, this work proposes the Broad Reaction Set (BRS), a dataset featuring 20 generic reaction templates that allow for the efficient exploration of the chemical space. Additionally, ProPreT5 is introduced, a T5 model tailored to chemistry that achieves a balance between rigid templates and template-free methods. ProPreT5 demonstrates its capability to generate accurate, valid, and realistic reaction products, making it a promising solution that goes beyond the current state-of-the-art on the complex reaction product prediction task.